NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Reconfigurable Stream Network Architecture

https://doi.org/10.1145/3695053.3731088

Wang, Chengyue; Zhang, Xiaofan; Cong, Jason; Hoe, James C (June 2025, ACM)

As AI systems grow increasingly specialized and complex, managing hardware heterogeneity becomes a pressing challenge. How can we efficiently coordinate and synchronize heterogeneous hardware resources to achieve high utilization? How can we minimize the friction of transitioning between diverse computation phases, reducing costly stalls from initialization, pipeline setup, or drain? Our insight is that a network abstraction at the ISA level naturally unifies heterogeneous resource orchestration and phase transitions. This paper presents a Reconfigurable Stream Network Architecture (RSN), a novel ISA abstraction designed for the DNN domain. RSN models the datapath as a circuit-switched network with stateful functional units as nodes and data streaming on the edges. Programming a computation corresponds to triggering a path. Software is explicitly exposed to the compute and communication latency of each functional unit, enabling precise control over data movement for optimizations such as compute-communication overlap and layer fusion. As nodes in a network naturally differ, the RSN abstraction can efficiently virtualize heterogeneous hardware resources by separating control from the data plane, enabling low instruction-level intervention. We build a proof-of-concept design RSN-XNN on VCK190, a heterogeneous platform with FPGA fabric and AI engines. Compared to the SOTA solution on this platform, it reduces latency by 6.1x and improves throughput by 2.4x–3.2x. Compared to the T4 GPU with the same FP32 performance, it matches latency with only 18% of the memory bandwidth. Compared to the A100 GPU at the same 7nm process node, it achieves 2.1x higher energy efficiency in FP32.
more » « less
Free, publicly-accessible full text available June 20, 2026
HerQules: securing programs via hardware-enforced message queues

https://doi.org/10.1145/3445814.3446736

Chen, Daming D.; Lim, Wen Shih; Bakhshalipour, Mohammad; Gibbons, Phillip B.; Hoe, James C.; Parno, Bryan (April 2021, ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

Full Text Available
We need kernel interposition over the network dataplane

https://doi.org/10.1145/3458336.3465281

Sadok, Hugo; Zhao, Zhipeng; Choung, Valerie; Atre, Nirav; Berger, Daniel S.; Hoe, James C.; Panda, Aurojit; Sherry, Justine (June 2021, HotOS '21: Proceedings of the Workshop on Hot Topics in Operating Systems)
null (Ed.)
Kernel-bypass network APIs, which allow applications to circumvent the kernel and interface directly with the NIC hardware, have recently emerged as one of the main tools for improving application network performance. However, allowing applications to circumvent the kernel makes it impossible to use tools (e.g., tcpdump) or impose policies (e.g., QoS and filters) that need to consider traffic sent by different applications running on a host. This makes maintainability and manageability a challenge for kernel-bypass applications. In response we propose Kernel On-Path Interposition (KOPI), in which traditional kernel dataplane functionality is retained but implemented in a fully programmable SmartNIC. We hypothesize that KOPI can support the same tools and policies as the kernel stack while retaining the performance benefits of kernel bypass.
more » « less
Full Text Available

Search for: All records